Assignment 3: Constrained Optimization, SVMs
This material, no matter whether in printed or electronic form, may be used for personal and non-commercial educational use only. Any reproduction of this material, no matter whether as a whole or in parts, no matter whether in printed or in electronic form, requires explicit prior acceptance of the authors.
Automatic Testing Guidelines
Automatic unittesting requires you to submit a notebook which contains strictly defined objects. Strictness of definition consists of unified shapes, dtypes, variable names and more.
Within the notebook, we provide detailed instruction which you should follow in order to maximise your final grade.
Name your notebook properly, follow the pattern in the template name:
Assignment_N_NameSurname_matrnumber
- N - number of assignment
- NameSurname - your full name where every part of the name starts with a capital letter, no spaces
- matrnumber - you student number on ID card (with k, potenitially with a leading zero)
Don't add any cells but use the ones provided by us. All cells have a unique ID so that the unit test can find it, so please do not add or remove any cell!
Always make sure that implemented functions have the correct output and given variables contain the correct data type. In the descriptions for every function you can find information on what datatype an output should have and you should stick to that in order to minimize conflicts with the unittest. Don't import any other packages than listed in the cell with the "imports" tag.
Questions are usually multiple choice (except the task description says otherwise) and can be answered by changing the given variables to either "True" or "False". "None" is counted as a wrong answer in any case!
Note: Never use variables you defined in another cell in your functions directly; always pass them to the function as a parameter. In the unitest, they won't be available either. If you want to make sure that everything is executable for the unittest, try executing cells/functions individually (instead of running the whole notebook).
Task 1: Constrained optimization
Consider the following primal (constrained convex optimization) problem:
\begin{align*} \text{Minimize} \qquad &f(w_1,w_2) = 2\,(w_1^4+w_2^4) \\ \text{subject to} \qquad &h(w_1,w_2) = 4-w_1+w_2 \le 0 \\ \end{align*}
We try to solve it via two ways. Perform the following tasks:
- Compute the Lagrangian $L(w_1,w_2,\alpha)$.
- Calculate its derivatives with respect to $w_1$ and $w_2$ and calculate the zeros of these derivatives ($w_1^*,w_2^*$).
First way:
- Solve the problem using the KKT conditions.
Second way:
- Write down the dual problem and solve it directly without(!) the help of the KKT-conditions. Hint: You need the derivative of $\mathcal{L} = L(w_1^*,w_2^*,\alpha)$ with respect to $\alpha$.
For your calculation use the given notation.
Calculation 1.1 (2 Points):
YOUR ANSWER HERE
Define the objective function and constraint:
$ f(w_1, w_2) = 2\,(w_1^4 + w_2^4)$, $\quad h(w_1, w_2) = 4 - w_1 + w_2$
Substitute $f(w_1, w_2)$ and $h(w_1, w_2)$ into the Lagrangian:
$L(w_1, w_2, \alpha) = 2\,(w_1^4 + w_2^4) + \alpha (4 - w_1 + w_2)$
Thus, the Lagrangian is:
$L(w_1, w_2, \alpha) = 2w_1^4 + 2w_2^4 + \alpha (4 - w_1 + w_2)$
Calculation 1.2 (5 Points):
YOUR ANSWER HERE
Derivative of $L(w_1, w_2, \alpha)$ with respect to $w_1$:
$\frac{\partial L}{\partial w_1} = 8w_1^3 - \alpha$
Set the derivative to zero to find critical points:
$8w_1^3 - \alpha = 0 \quad \implies \quad \alpha = 8w_1^3$
Derivative of $L(w_1, w_2, \alpha)$ with respect to $w_2$:
$\frac{\partial L}{\partial w_2} = 8w_2^3 + \alpha$
Set the derivative to zero to find critical points:
$8w_2^3 + \alpha = 0 \quad \implies \quad \alpha = -8w_2^3$
Equating the expressions for $\alpha$:
$8w_1^3 = -8w_2^3 \quad \implies \quad w_1^3 + w_2^3 = 0$
Thus:
$w_2 = -w_1$
Calculation 1.3 (8 Points):
YOUR ANSWER HERE
Step 1: KKT Conditions
The KKT conditions for the given problem are:
- Stationarity: $\nabla f(w_1, w_2) + \alpha \nabla h(w_1, w_2) = 0$
- Primal feasibility: $h(w_1, w_2) \leq 0$
- Dual feasibility: $\alpha \geq 0$
- Complementary slackness: $\alpha h(w_1, w_2) = 0$
Step 2: Solve the Stationarity Condition
The gradients are:
$\nabla f(w_1, w_2) = [8w_1^3, 8w_2^3]^T, \quad \nabla h(w_1, w_2) = [-1, 1]^T$
Stationarity condition:
$\begin{bmatrix} 8w_1^3 \\ 8w_2^3 \end{bmatrix} • \alpha \begin{bmatrix} -1 \\ 1 \end{bmatrix} = 0$
This gives two equations:
- $8w_1^3 - \alpha = 0 \quad \implies \quad \alpha = 8w_1^3$
- $8w_2^3 + \alpha = 0 \quad \implies \quad \alpha = -8w_2^3$
From $\alpha = 8w_1^3$ and $\alpha = -8w_2^3$, we have:
$8w_1^3 = -8w_2^3 \quad \implies \quad w_1^3 + w_2^3 = 0 \quad \implies \quad w_2 = -w_1$
Step 3: Check Feasibility and Solve for $(w_1, w_2)$
Substitute $w_2 = -w_1$ into the constraint $h(w_1, w_2) \leq 0$:
$4 - w_1 + (-w_1) \leq 0 \quad \implies \quad 4 - 2w_1 \leq 0 \quad \implies \quad w_1 \geq 2$
From this, the feasible point is:
$w_1 = 2, \quad w_2 = -2$
Step 4: Determine $\alpha$:
$\alpha = 8w_1^3 = 8(2)^3 = 64$
Thus, the solution is:
$w_1^* = 2, \quad w_2^* = -2, \quad \alpha^* = 64$
Calculation 1.4 (10 Points):
YOUR ANSWER HERE
Second Way: Solve the Dual Problem
Step 1: Dual Function
The dual function is defined as:
$g(\alpha) = \inf_{w_1, w_2} L(w_1, w_2, \alpha)$
Substitute $L(w_1, w_2, \alpha)$:
$L(w_1, w_2, \alpha) = 2w_1^4 + 2w_2^4 + \alpha(4 - w_1 + w_2)$
Minimize $L(w_1, w_2, \alpha)$ with respect to $w_1$ and $w_2$:
From earlier:
$\frac{\partial L}{\partial w_1} = 8w_1^3 - \alpha \quad \implies \quad w_1 = \left(\frac{\alpha}{8}\right)^{1/3}$
$\frac{\partial L}{\partial w_2} = 8w_2^3 + \alpha \quad \implies \quad w_2 = -\left(\frac{\alpha}{8}\right)^{1/3}$
Substitute these into $L(w_1, w_2, \alpha)$:
$L\left(\left(\frac{\alpha}{8}\right)^{1/3}, -\left(\frac{\alpha}{8}\right)^{1/3}, \alpha\right) = 2\left(\frac{\alpha}{8}\right)^{4/3} + 2\left(\frac{\alpha}{8}\right)^{4/3} + \alpha(4)$
Simplify:
$g(\alpha) = 4\left(\frac{\alpha}{8}\right)^{4/3} + 4\alpha$
Step 2: Maximize $g(\alpha)$ Subject to $\alpha \geq 0$
Solve $\frac{dg}{d\alpha} = 0$ to find optimal $\alpha$:
$\frac{dg}{d\alpha} = \frac{16}{3}\left(\frac{1}{8}\right)^{4/3} \alpha^{1/3} + 4 = 0$
Solve for $\alpha$, yielding:
$\alpha = 64$
Substitute $\alpha = 64$ back to find $w_1$ and $w_2$:
$w_1 = \left(\frac{\alpha}{8}\right)^{1/3} = 2, \quad w_2 = -2$
Thus, the solution is:
$w_1^* = 2, \quad w_2^* = -2, \quad \alpha^* = 64$
Suppose we replace in the primal optimization problem of C-SVMs the penalty term $\|\xi\|_1=\sum_{i=1}^l \xi_i$ with $\|\xi\|_2^2=\sum_{i=1}^l \xi_i^2$ (that is we use the quadratic hinge loss instead). Thus, the primal problem we are considering is given as follows (with $C>0$):
\begin{align*} \text{Minimize} \qquad &\frac{1}{2}\|\mathbf{w}\|^2+C \sum_{i=1}^l \xi_i^2 \\ \text { subject to } \qquad &-[y_i\left(\mathbf{w} \cdot \mathbf{x}_i-b\right) -1+\xi_i] \leq 0 \\ \text{and} \qquad &-\xi_i \leq 0 \ \ \textrm{for} \ \ i=1,...,l \end{align*}
for all $i=1,...,l$. Give the associated dual optimization problem by making use of the KKT-Theorem and perform the following tasks:
- Why can the KKT-Theorem be applied?
- Calculate the Lagrange function $L$.
- Calculate its derivatives (with respect to $w_i$, $b$ and $\xi_i$) and compute their zeros.
- Write down the dual function (named $\cal L$ in the slides) and the corresponding dual problem, depending on $2l$ Lagrange multipliers. (You don't need to solve the dual problem.)
- Simplify the dual problem so that it only depends on $l$ Lagrange variables. Hint: The maximization with respect to $\lambda_i$ can be done quite easily. (Again: You don't need to solve the dual problem. Just find the simplified form.)
Please provide reasoning and explanations in full sentences.
Calculation 1.5 (4 Points):
YOUR ANSWER HERE
Task 1: Why can the KKT Theorem be applied?
Explanation
The Karush-Kuhn-Tucker (KKT) Theorem can be applied if the following conditions hold:
- The primal optimization problem is convex, i.e., the objective function and feasible region are convex.
- There exist feasible points that satisfy the constraints (the feasible region is non-empty).
- The Slater’s condition is satisfied for inequality constraints, which means that strict inequalities can be satisfied for at least one feasible point.
In this problem:
- The objective function $\frac{1}{2}\|\mathbf{w}\|^2 + C \sum_{i=1}^l \xi_i^2$ is strictly convex in $\mathbf{w} and \xi_i$, because it consists of a quadratic term in $\mathbf{w}$ and $\xi_i$.
- The constraints are linear in $\mathbf{w}$, $b$, and $\xi_i$, which defines a convex feasible region.
- Slater’s condition holds since for sufficiently large $\xi_i > 0$, the constraints are satisfied.
Thus, the KKT conditions can be applied.
Calculation 1.6 (3 Points):
YOUR ANSWER HERE
Task 2: Calculate the Lagrangian L
Lagrangian Formulation
The Lagrangian for this problem includes the primal objective function and Lagrange multipliers for each constraint. Let:
- $\alpha_i \geq 0$ be the Lagrange multipliers for the margin constraint: $-[y_i (\mathbf{w} \cdot \mathbf{x}_i - b) - 1 + \xi_i] \leq 0$,
- $\lambda_i \geq 0$ be the Lagrange multipliers for the positivity constraint: $-\xi_i \leq 0$.
The Lagrangian is:
$L(\mathbf{w}, b, \boldsymbol{\xi}, \boldsymbol{\alpha}, \boldsymbol{\lambda}) = \frac{1}{2}\|\mathbf{w}\|^2 + C \sum_{i=1}^l \xi_i^2 \sum_{i=1}^l \alpha_i \big[ y_i (\mathbf{w} \cdot \mathbf{x}_i - b) - 1 + \xi_i \big] \sum_{i=1}^l \lambda_i (-\xi_i)$.
Calculation 1.7 (6 Points):
YOUR ANSWER HERE
Task 3: Calculate derivatives and compute their zeros
Derivative with respect to $\mathbf{w}$:
$\frac{\partial L}{\partial \mathbf{w}} = \mathbf{w} - \sum_{i=1}^l \alpha_i y_i \mathbf{x}_i$.
Set $\frac{\partial L}{\partial \mathbf{w}} = 0$:
$\mathbf{w} = \sum_{i=1}^l \alpha_i y_i \mathbf{x}_i$.
Derivative with respect to b:
$\frac{\partial L}{\partial b} = -\sum_{i=1}^l \alpha_i$.
Set $\frac{\partial L}{\partial b} = 0$:
$\sum_{i=1}^l \alpha_i = 0$.
Derivative with respect to $\xi_i$:
$\frac{\partial L}{\partial \xi_i} = 2C \xi_i - \alpha_i + \lambda_i$.
Set $\frac{\partial L}{\partial \xi_i} = 0$:
$\lambda_i = \alpha_i - 2C \xi_i$.
From the positivity constraint $(\lambda_i \geq 0)$:
$\alpha_i \geq 2C \xi_i$.
Calculation 1.8 (4 Points):
YOUR ANSWER HERE
Task 4: Write down the dual function and dual problem
Dual Function
The dual function is defined as:
$\mathcal{L}(\boldsymbol{\alpha}, \boldsymbol{\lambda}) = \inf_{\mathbf{w}, b, \boldsymbol{\xi}} L(\mathbf{w}, b, \boldsymbol{\xi}, \boldsymbol{\alpha}, \boldsymbol{\lambda})$
Substitute the primal optimality conditions into $L$:
- Substitute $\mathbf{w} = \sum_{i=1}^l \alpha_i y_i \mathbf{x}_i$,
- Substitute $\sum_{i=1}^l \alpha_i = 0$,
- Substitute $\lambda_i = \alpha_i - 2C \xi_i$.
Dual Problem
The dual function simplifies to:
$\mathcal{L}(\boldsymbol{\alpha}) = \frac{1}{2} \sum_{i=1}^l \sum_{j=1}^l \alpha_i \alpha_j y_i y_j (\mathbf{x}_i \cdot \mathbf{x}j) - \sum{i=1}^l \alpha_i$,
subject to:
- $\sum_{i=1}^l \alpha_i y_i = 0$,
- $0 \leq \alpha_i \leq 2C$.
Thus, the dual problem is:
$\text{Maximize:} \quad \mathcal{L}(\boldsymbol{\alpha}) = \frac{1}{2} \sum_{i=1}^l \sum_{j=1}^l \alpha_i \alpha_j y_i y_j (\mathbf{x}_i \cdot \mathbf{x}j) - \sum{i=1}^l \alpha_i$,
$\text{subject to:} \quad \sum_{i=1}^l \alpha_i y_i = 0, \quad 0 \leq \alpha_i \leq 2C$.
Calculation 1.9 (8 Points):
YOUR ANSWER HERE
Task 5: Simplify the dual problem to depend on $l$ variables
Simplification
Using $\lambda_i = \alpha_i - 2C \xi_i$, we can eliminate $\lambda_i$ and $\xi_i$. The dual problem now depends only on $\alpha_i$:
$\text{Maximize:} \quad \mathcal{L}(\boldsymbol{\alpha}) = \frac{1}{2} \sum_{i=1}^l \sum_{j=1}^l \alpha_i \alpha_j y_i y_j (\mathbf{x}_i \cdot \mathbf{x}j) - \sum{i=1}^l \alpha_i$,
$\text{subject to:} \quad \sum_{i=1}^l \alpha_i y_i = 0, \quad 0 \leq \alpha_i \leq 2C$.
This is the final form of the dual problem.
Task 2: Introduction to Support Vector Machines
The aim of the following task is to equip you with some intuition concerning the application of different SVMs to an easy data set.
You should also observe how different versions of the SVMs with different hyperparameters react to additional noise. To this end we provided you with a function plot_data which is intended to create proper visualizations of important characteristics of linear and nonlinear SVMs, like the decision border and support vectors.
The usual routine for applying SVMs in Python, which is also used here, is given by the following sklearn-package: SVC.
Your first task is to get familiar with the plot_data function by applying it to the easy data set radial_data.csv. Iterate over different kernels and hyperparameters (consider the documentation for details), i.e. perform the following tasks:
- Code 2.1:
- Plot the dataset using the provided functions. (Just execute the existing code.)
- Using the sklearn-package mentioned above, write a function
iter_degreethat applies a SVM with apolynomial kernelandC=10to the data and iterates over the degrees (i.e. hyperparameter). This function should then plot each fitted model alongside the data using the provided functionplot_data.iter_degreemust return a list of model parameters (in the form of dictionaries) for each model you initiated and fitted on our data. The output ofget_params()is a dictionary; see sklearn docs. Utilize theplot_datafunction by passing each fitted model and the data itself in order to visualize the results. You will get 0 points if you choose to use your own visualization function over the provided one!
- Question 2.1:
- Answer some questions about the resulting plots.
- Code 2.2:
- Now, write a similar function called
iter_Cthat applies a SVM with anrbf kernelanddegree=3to the data and iterates over the given C values (i.e. hyperparameter) and plots the fitted models alongside the data. This function should return something similar toiter_degree. Again, you will get 0 points if you choose to use your own visualization function over the provided one!
- Now, write a similar function called
- Question 2.2:
- Again, answer some questions about your results from the previous task.
# Nothing to do here, just run the cell.
import numpy as np
import matplotlib.pyplot as plt
import warnings
from sklearn import svm
warnings.filterwarnings("ignore")
# Nothing to do here, just run the cell.
def load_data():
"""Function allows to load data from csv"""
Z = np.genfromtxt('radial_data.csv', delimiter=',')
return Z[:,:-1], Z[:,-1]
def get_meshgrid(X, resolution):
"""Function creates space/grid. Used for plotting"""
s = np.max(np.abs(X))*1.05
ls = np.linspace(-s, s, resolution)
X1,X2 = np.meshgrid(ls, ls, sparse=False)
return np.c_[X1.ravel(), X2.ravel()]
def plot_data(
X,
y,
model = None,
plot_borders = True,
plot_classification = True,
plot_support_vectors = True,
plot_size = 7,
resolution = 500,
title = 'Data visualization',
color = ['blue','orange']
):
"""Plot data points and an optional model's decision boundaries and classifications.
Parameters
----------
X : array-like of shape (n_samples, 2)
Input data points with two features.
y : array-like of shape (n_samples,)
Target labels for the data points. Assumes binary classification with labels
+1 and -1.
model : object, optional
A fitted classification model with `predict`, `decision_function`, `support_vectors_`,
`support_`, `kernel`, and possibly `degree` and `C` attributes if applicable. The model
should be compatible with scikit-learn's SVM models.
plot_borders : bool, default=True
If True, plots the decision boundaries of the model.
plot_classification : bool, default=True
If True, plots the model's classification regions.
plot_support_vectors : bool, default=True
If True, highlights the support vectors used by the model, if applicable.
plot_size : int, default=7
Controls the size of the plot.
resolution : int, default=500
The resolution of the mesh grid used to plot decision boundaries and classification regions.
title : str, default='Data visualization'
Title for the plot.
color : list of str, default=['blue', 'orange']
Colors for the two classes. The first color corresponds to the positive class (+1),
and the second color corresponds to the negative class (-1).
"""
if model is not None:#if you want to plot model
if plot_classification and plot_borders:
col=2 #if you want to plot model and borders
else :
col=1 #if you want to plot model only
fig,axs = plt.subplots(1,col,figsize=(plot_size*col,plot_size))
grid = get_meshgrid(X, resolution)
V = model.support_vectors_
mask_sv = model.support_ #np.where(np.isin(X[:,0],V[:,0]))[0]
kernel = model.kernel
if kernel == 'poly':
title = f"kernel: {kernel} - degree: {model.degree} - cost:{model.C}"
elif kernel == 'rbf':
title = f"kernel: {kernel}"
if model.gamma != "auto_deprecated" :
title+= f" - gamma: {model.gamma}"
title += f" - cost: {model.C}"
for i in range(col):
if col>1:
ax = axs[i]
else:
ax = axs
ax.set_aspect('equal')
if i==0 and plot_borders:
ax.set_title('Margins - ' + title,fontsize=plot_size*2)
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
borders = model.decision_function(grid)
mask_pos = borders >= 1
mask_neg = borders <= -1
ax.scatter(grid[mask_pos,0], grid[mask_pos,1], c=color[0], alpha=0.01, s=10)
ax.scatter(grid[mask_neg,0], grid[mask_neg,1], c=color[1], alpha=0.01, s=10)
ax.scatter(X[mask_sv,0], X[mask_sv,1], c='g',label= str(np.sum(model.n_support_)) +' SV',s=40,marker='o')
if plot_classification and (i==1 or not plot_borders):
ax.set_title('Classification - ' + title,fontsize=plot_size*2)
ax.set_xlabel("$x_1$")
ax.set_ylabel("$x_2$")
classification = model.predict(grid)
mask_pos = classification > 0
mask_neg = classification < 0
ax.scatter(grid[mask_pos,0], grid[mask_pos,1], c=color[0], alpha=0.01, s=10)
ax.scatter(grid[mask_neg,0], grid[mask_neg,1], c=color[1], alpha=0.01, s=10)
classification = model.predict(X)
mask_wrong = classification != y
ax.scatter(X[mask_wrong,0], X[mask_wrong,1], c='magenta',label=str(np.sum(mask_wrong)) + ' faults',s=40,marker='o')
m = y > 0
ax.scatter(X[m,0], X[m,1], c=color[0],label='class +1',s=10)
m = np.logical_not(m)
ax.scatter(X[m,0], X[m,1], c=color[1],label='class -1',s=10)
ax.legend(loc='lower left', fontsize=plot_size*1.5)
else:
fig,axs = plt.subplots(1,1,figsize=(plot_size,plot_size))
axs.set_aspect('equal')
m = y > 0
axs.scatter(X[m,0], X[m,1], c=color[0],label='class +1',s=10)
m = np.logical_not(m)
axs.scatter(X[m,0], X[m,1], c=color[1],label='class -1',s=10)
axs.legend(loc='lower left', fontsize=plot_size*1.5)
plt.xlabel("$x_1$")
plt.ylabel("$x_2$")
plt.title(title, fontsize=plot_size*2)
plt.show()
Code 2.1 (5 Points):
# Nothing to do here, just run the cell.
X, y = load_data()
# Nothing to do here, just run the cell.
plot_data(X, y)
def iter_degree(degrees: range, X: np.ndarray, y: np.ndarray) -> list[dict]:
"""Fit an SVM with polynomial kernel and C=10 of varying degrees and plot each variation.
This function iterates over a specified range of polynomial degrees, fits an
SVM model with each degree, and plots the decision boundaries. The parameters
of each model are stored in a dictionary and returned in a list.
Parameters
----------
degrees : range
Range of integer values representing the polynomial degrees to use for the SVM model.
X : (N, 2) np.ndarray
Data matrix containing the training data.
y : (N,) np.ndarray
Target vector containing the labels.
Returns
-------
list of dict
A list of dictionaries, where each dictionary contains the parameters of an SVM model
with a specific polynomial degree. The length of the list is equal to the length of `degrees`.
"""
# YOUR CODE HERE
models_params = []
for degree in degrees:
# Initialize the SVM with polynomial kernel and C=10
clf = svm.SVC(kernel='poly', degree=degree, C=10, gamma='auto')
# Fit the model
clf.fit(X, y)
# Plot the model with the data
plot_data(X, y, model=clf)
# Get the model parameters and append to the list
models_params.append(clf.get_params())
return models_params
# Nothing to do here, just run the cell.
degrees = range(1, 6)
models_degree = iter_degree(degrees, X, y)
# DO NOT DELETE THIS CELL!
assert isinstance(models_degree, list), "The output of iter_degree is not a list!"
assert len(models_degree) == len(degrees), "Some models are missing!"
assert all(isinstance(model, dict) for model in models_degree), "At least one of the elements in the list is not a dictionary!"
Question 2.1 (6 Points):
What observations can you make from your plots? (several answers may be correct)
a1_) The SVM with polynomial degree 2 already seems to do quite well on the data set.
b1_) The higher the polynomial degree, the better the classifier.
c1_) A very high number of support vectors seems to be an indicator of a good choice of the kernel.
d1_) There is only small difference between the pictures that were produced by polynomial kernels of even degree.
e1_) For kernels with an odd degree, the number of misclassified samples decreases with an increasing degree.
f1_) For odd degrees, one can see that the model complexity increases with increasing degree.
To answer the question, assign True or False boolean values to variables in the next cell. For example, if you think that a1_) is correct, define a variable a1_ and set it to True, the same applies to b1_) and the other options. A non-correctly answered question as well as no answer (i.e. answer “None”) yields 0 points for a specific question.
# YOUR CODE HERE
a1_ = True
b1_ = False
c1_ = False
d1_ = False
e1_ = False
f1_ = True
# DO NOT DELETE THIS CELL!
assert a1_ is not None, "Store True/False!"
assert a1_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert b1_ is not None, "Store True/False!"
assert b1_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert c1_ is not None, "Store True/False!"
assert c1_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert d1_ is not None, "Store True/False!"
assert d1_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert e1_ is not None, "Store True/False!"
assert e1_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert f1_ is not None, "Store True/False!"
assert f1_ in [True, False], "Invalid Answer!"
Code 2.2 (5 Points):
def iter_C(C_values: list[float], X: np.ndarray, y: np.ndarray) -> list[dict]:
"""Iteratively trains SVM models with different values of the regularization parameter `C`,
plots the decision boundaries for each model, and returns a list of model parameters.
Parameters
----------
C_values : list of float
A list of values for the regularization parameter `C` to be used in training the SVM models.
X : (N, 2) np.ndarray
Data matrix containing the training data.
y : (N,) np.ndarray
Target vector containing the labels.
Returns
-------
list of dict
A list of dictionaries, where each dictionary contains the parameters of the trained SVM model
for a specific value of `C`.
"""
# YOUR CODE HERE
models_params = []
for C in C_values:
# Initialize the SVM with RBF kernel and specified C
clf = svm.SVC(kernel='rbf', C=C, gamma='auto')
# Fit the model
clf.fit(X, y)
# Plot the model with the data
plot_data(X, y, model=clf)
# Get the model parameters and append to the list
models_params.append(clf.get_params())
return models_params
# Nothing to do here, just run the cell.
C_values = [0.1, 1, 10, 100, 1000]
models_C = iter_C(C_values, X, y)
# DO NOT DELETE THIS CELL!
assert isinstance(models_C, list), "The output of iter_degree is not a list!"
assert len(models_C) == len(C_values), "Some models are missing!"
assert all(isinstance(model, dict) for model in models_C), "At least one of the elements in the list is not a dictionary!"
Question 2.2 (6 points):
What observations can you make from your plots? Evaluate, if the given statements are true of false (several options may be correct):
a2_) The higher the cost, the more support vectors we have.
b2_) The decision borders don't change drastically with increasing $C$, only the number of support vectors does.
c2_) The lower the cost, the larger the margin.
To answer the question, assign True or False boolean values to variables in the next cell. For example, if you think that a2_) is correct, define a variable a2_ and set it to True, the same applies to b2_) and the other option. A non-correctly answered question as well as no answer (i.e. answer “None”) yields 0 points for a specific question.
# YOUR CODE HERE
a2_ = True
b2_ = False
c2_ = True
# DO NOT DELETE THIS CELL!
assert a2_ is not None, "Store True/False!"
assert a2_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert b2_ is not None, "Store True/False!"
assert b2_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert c2_ is not None, "Store True/False!"
assert c2_ in [True, False], "Invalid Answer!"
Task 3: SVM with new data
- Code 3.1:
- Use the given function (polar-to-cartesian)
pol2cart()in order to generate 100 two-dimensional data points which are uniformly distributed across the area of a circle with a radius ofr=0.25. Label these points with-1and add them to the givenXdata matrix andylabel vector. Note: In polar coordinates, in order to get uniformly distributed samples, you sample the angle uniformly, but be careful with the radius. If you simply sample the radius uniformly, you will not end up with uniformly sampled points within the circle. Hint: Have a look at the formula for the area of a circle and use np.random.rand for sampling points from a uniform distribution.
- Use the given function (polar-to-cartesian)
- Code 3.2:
- Now use an
rbf kernel(which is the default option forsklearn.svm.SVC) and again play around with the parameters to explore the effects on the classification performance by appropriately using theplot_datafunction. Write a functioniter_gamma_C, analogous toiter_degreeanditer_C.iter_gamma_Citerates over different costs $C$ and over different values of $\gamma := 1/(2\sigma^2)$ (compare with the RBF definition in the lecture slides).
- Now use an
- Question 3.2:
- Answer some questions about your plots.
Code 3.1 (7 Points):
# Nothing to do here, just run the cell.
def pol2cart(r: np.ndarray, phi: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
"""Function that converts arrays of polar coordinates to arrays of cartesian coordinates."""
x = r * np.cos(phi)
y = r * np.sin(phi)
return x, y
def create_new_data(X: np.ndarray, y: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
"""Generates 100 uniformly distributed 2D data points within a circle of radius 0.25,
labels them as -1, and appends them to the existing data.
Parameters
----------
X : (N, 2) np.ndarray
Original data matrix.
y : (N,) np.ndarray
Original label vector.
Returns
-------
tuple of np.ndarray
- X_new : (N+100, 2) np.ndarray
New data matrix.
- y_new : (N+100,) np.ndarray
New label vector.
"""
# YOUR CODE HERE
num_new_points = 100
radius = 0.25
# To ensure uniform distribution within the circle:
# Sample the radius with sqrt to account for area
rand_r = radius * np.sqrt(np.random.rand(num_new_points))
rand_phi = np.random.uniform(0, 2*np.pi, num_new_points)
x_new, y_new_coords = pol2cart(rand_r, rand_phi)
new_points = np.vstack((x_new, y_new_coords)).T
new_labels = -1 * np.ones(num_new_points)
# Append to existing data
X_new = np.vstack((X, new_points))
y_new = np.concatenate((y, new_labels))
return X_new, y_new
# DO NOT DELETE THIS CELL!
X_new, y_new = create_new_data(X, y)
assert isinstance(X_new, np.ndarray) and isinstance(y_new, np.ndarray), "Both outputs must be a np.ndarray!"
assert X_new.shape == (X.shape[0]+100, X.shape[1]), "You should add exactly 100 data points!"
assert all(y_new[-100:] == -1), "The new samples should have the label -1!"
assert all(np.sqrt(X_new[-100:, 0]**2 + X_new[-100:, 1]**2) <= 0.25), "Some samples do not lie within a circle of radius 0.25!"
# Nothing to do here, just run the cell.
plot_data(X_new, y_new)
Code 3.2 (6 Points):
def iter_gamma_C(gamma_range: list[float], C_values: list[float], X: np.ndarray, y: np.ndarray) -> list[dict]:
"""Fits an SVM model for each combination of gamma and C values and plots the results.
Parameters
----------
gamma_range : list of float
List of gamma values for the SVM model.
C_values : list of float
List of regularization parameter (C) values for the SVM model.
X : (N, 2) np.ndarray
Data matrix.
y : (N,) np.ndarray
Label vector.
Returns
-------
list of dict
A list of dictionaries containing the parameters of each fitted SVM model.
"""
# YOUR CODE HERE
models_params = []
for gamma in gamma_range:
for C in C_values:
# Initialize the SVM with RBF kernel, specified gamma and C
clf = svm.SVC(kernel='rbf', C=C, gamma=gamma)
# Fit the model
clf.fit(X, y)
# Plot the model with the data
plot_data(X, y, model=clf)
# Get the model parameters and append to the list
models_params.append(clf.get_params())
return models_params
# Nothing to do here, just run the cell.
cost_values = [10**i for i in range(-1, 4)]
gamma_values = [0.1, 0.5 ,0.9]
models_gamma_C = iter_gamma_C(gamma_values, cost_values, X_new, y_new)
# DO NOT DELETE THIS CELL!
assert isinstance(models_gamma_C, list), "The output of iter_gamma_C is not a list!"
assert len(models_gamma_C) == len(C_values) * len(gamma_values), "Some models are missing!"
assert all(isinstance(model, dict) for model in models_gamma_C), "At least one of the elements in the list is not a dictionary!"
Question 3.2 (5 Points):
What observations can you make from your plots? Evaluate, if the given statements are true of false (several options may be correct):
a3_) For a fixed $\gamma$, the larger the cost $C$, the higher the number of support vectors.
b3_) For relatively large $C$ and relatively large $\gamma$ (say $C \geq 100$ and $\gamma > 0.5$), enlarging $\gamma$ further doens't improve your performance significantly.
c3_) For fixed $C$, increasing $\gamma$ (i.e. decreasing the width of the Gaussian) typically increases the model complexity of the SVM.
d3_) For a fixed gamma value, the higher the cost, the smaller the margin.
To answer the question, assign True or False boolean values to variables in the next cell. For example, if you think that a3_) is correct, define a variable a3_ and set it to True, the same applies to b3_) and the other options. A non-correctly answered question as well as no answer (i.e. answer “None”) yields 0 points for a specific question.
# YOUR CODE HERE
a3_ = True
b3_ = True
c3_ = True
d3_ = True
# DO NOT DELETE THIS CELL!
assert a3_ is not None, "Store True/False!"
assert a3_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert b3_ is not None, "Store True/False!"
assert b3_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert c3_ is not None, "Store True/False!"
assert c3_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert d3_ is not None, "Store True/False!"
assert d3_ in [True, False], "Invalid Answer!"
Task 4: SVM with an rbf kernel and an outlier
Finally, we want to investigate, how the rbf kernel classifier reacts to outliers.
- Code 4.1:
- To this end, implement the function
add_pointthat adds a single (outlier) point to your new data setX_new,y_new(i.e. the data with the additional disk with $-1$ labels) and label this point with $y=+1$. - We are going to use an rbf kernel and play around with the parameter to explore the effects on the classification performance by again appropriately using the
iter_gamma_Cfunction. We are trying out small costs $C \sim 0.1$ and large costs $ C\sim 1000$ and also iterate over different values of $\gamma$, ranging from $0.1$ to $1$.
- To this end, implement the function
- Question 4.1:
- One last time, answer some questions about the resulting plots.
Code 4.1 (5 Points):
def add_point(X: np.ndarray, y: np.ndarray, data_point: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
""" Add a new data point to the input arrays.
This function appends a new `data_point` to the feature array `X`
and a label `1` to the target array `y`.
Parameters
----------
X : (N, 2) np.ndarray
The data matrix containing the original data points.
y : (N,) np.ndarray
The target vector containing the original labels.
data_point : (1, 2) np.ndarray
The new datapoint to add.
Returns
-------
tuple of np.ndarray
- X_new : (N+1, 2) np.ndarray
The updated data matrix with the new point added.
- y_new : (N+1,) np.ndarray
The updated target vector with the new label added.
"""
# YOUR CODE HERE
# Append the new data point to X
X_new = np.vstack((X, data_point))
# Append the label +1 to y
y_new = np.concatenate((y, [1]))
return X_new, y_new
outlier = np.array([[1.8, -1.3]])
X_with_outlier, y_with_outlier = add_point(X_new, y_new, outlier)
assert isinstance(X_with_outlier, np.ndarray) and isinstance(y_with_outlier, np.ndarray), "Both outputs of add_point should be of type np.ndarray!"
assert X_with_outlier.shape == (X_new.shape[0]+1, X_new.shape[1]), "You should add exactly 1 data point!"
np.testing.assert_array_equal(X_with_outlier[-1:], outlier)
assert y_with_outlier[-1] == 1, "The label of the outlier should be +1!"
plot_data(X_with_outlier, y_with_outlier)
_ = iter_gamma_C(gamma_values, cost_values, X_with_outlier, y_with_outlier)
Question 4.1 (5 Points):
What observations can you make from your plots? Tick the correct boxes:
a4_) For relatively low costs (e.g. $C\le 1$) and an appropriately chosen $\gamma$ (e.g. $0.9$) the classifier correctly classifies all points except for the outlier.
b4_) For relatively high costs (e.g. $C\geq 100$) the classifier always, i.e. independent of $\gamma$, shows a region of the positive class near the outlier.
c4_) Classifiers with high costs ($C\geq 100$ say) and high $\gamma$ ($ \gamma \geq 0.5$) are susceptible to overfitting.
d4_) For relatively high costs ($C\geq 100$) and small $\gamma$ (0.1), the SVM correctly classifies except for the outlier.
To answer the question, assign True or False boolean values to variables in the next cell. For example, if you think that a4_) is correct, define a variable a4_ and set it to True, the same applies to b4_) and the other options. A non-correctly answered question as well as no answer (i.e. answer “None”) yields 0 points for a specific question.
# YOUR CODE HERE
a4_ = True
b4_ = True
c4_ = True
d4_ = False
# DO NOT DELETE THIS CELL!
assert a4_ is not None, "Store True/False!"
assert a4_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert b4_ is not None, "Store True/False!"
assert b4_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert c4_ is not None, "Store True/False!"
assert c4_ in [True, False], "Invalid Answer!"
# DO NOT DELETE THIS CELL!
assert d4_ is not None, "Store True/False!"
assert d4_ in [True, False], "Invalid Answer!"